Skip to content

Conversation

cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Jul 13, 2025

@cmp0xff cmp0xff marked this pull request as ready for review July 13, 2025 07:05
@cmp0xff cmp0xff changed the title fix: #718 only drop TimestampSeries refactor: #718 only drop TimestampSeries Jul 13, 2025
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from c81cd6e to d5e1089 Compare July 16, 2025 17:54
@cmp0xff cmp0xff marked this pull request as draft July 16, 2025 17:56
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from d5e1089 to 41c7015 Compare July 16, 2025 18:19
@cmp0xff cmp0xff marked this pull request as ready for review July 16, 2025 20:11
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch 2 times, most recently from abf9147 to cbbd372 Compare July 17, 2025 15:10
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 23, 2025

@cmp0xff you have a number of PRs submitted while I was out on vacation for 2 weeks. Can you let me know which ones I should prioritize for review?

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Jul 23, 2025

Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.

Series: arithmetic operations

The following two PRs are independent. They migrate test_series.py to a subfolder series, and add quite a few test_*.py files there.

DataFrame.to_dict

Index.append

Series: address #718

  1. refactor: #718 only drop TimestampSeries #1274 - this is a prerequisite for the next one.
  2. refactor: #718 also drop TimedeltaSeries #1273

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It's a lot of good work.

Main thing - if I'm going to merge this PR, it needs to be in a state where we don't need the followup PR.

Basic rule - we don't put ignore in the tests unless we are testing that the stubs should not accept something that is invalid. You have places where you have added ignore in the tests and I won't merge that in (unless we know it is a bug in the type checker)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 24, 2025

Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.

I've reviewed them all, except #1273 as noted there.

Thanks for all the great work.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Jul 24, 2025

I've reviewed them all, except #1273 as noted there.

Thanks for all the great work.

Thank you very much for your quick and thorough reviews. I will be able to work on them next week.

@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from cbbd372 to ed69ec5 Compare July 28, 2025 15:05
@cmp0xff cmp0xff marked this pull request as draft July 30, 2025 07:52
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from b095af2 to f1cf19f Compare August 4, 2025 21:36
@cmp0xff
Copy link
Contributor Author

cmp0xff commented Aug 20, 2025

Before making this PR as "ready for review", I probably can still clean up the sub family, make some simplifications and homogenise __sub__ and sub.

@cmp0xff cmp0xff marked this pull request as ready for review August 21, 2025 10:12
@cmp0xff cmp0xff requested a review from Dr-Irv August 21, 2025 10:13
Comment on lines 173 to 174
check(assert_type(left_ts.rsub(s), pd.Series), pd.Series, pd.Timedelta)
check(assert_type(left_ts.rsub(a), pd.Series), pd.Series, pd.Timedelta)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime - Series[Any] can either be timedelta-like or datetime-like, depending on Any. I would not give an exact type here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

) -> TimestampProperties: ...
@overload
def __get__(
self, instance: Series[Timedelta], owner: Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TimedeltaSeries here or TimedeltaSeries | Series[Timedelta] and then change in the PR that will remove TimedeltaSeries

Comment on lines 2247 to 2266
def __mul__(
self, other: timedelta | Timedelta | TimedeltaSeries | np.timedelta64
self: Series[bool],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[bool], other: Series[Timedelta]) -> Series[Timedelta]: ... # type: ignore[overload-overlap]
@overload
def __mul__(
self: Series[int],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[int], other: Series[Timedelta]) -> Series[Timedelta]: ...
@overload
def __mul__(
self: Series[float],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[float], other: Series[Timedelta]) -> Series[Timedelta]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to combine these overloads like this?

 @overload
    def __mul__(
        self: Series[bool] | Series[int] | Series[float],
        other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
    ) -> TimedeltaSeries: ...
    @overload
    def __mul__(self: Series[bool]  | Series[int] | Series[float],, other: Series[Timedelta]) -> Series[Timedelta]: ...  # type: ignore[overload-overlap]

def median(
self,
self: Series[float],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self: Series[float],
self: Series[float] | Series[int]

And then for Series[bool], median() returns np.floating

Comment on lines +3975 to +3982
@overload
def to_numpy(
self,
dtype: DTypeLike | None = None,
copy: bool = False,
na_value: Scalar = ...,
**kwargs,
) -> np_1darray: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this overload because IndexOpsMixin has it. Then you can get rid of the ignores in _SeriesSubClassBase

@cmp0xff cmp0xff marked this pull request as draft August 21, 2025 19:43
@cmp0xff cmp0xff mentioned this pull request Aug 21, 2025
1 task
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at the most recent version, and found one issue, and there are a few other issues open from previous review. Ping when you want me to look at it.

# checking, where our `__radd__` cannot override. At runtime, they return
# `Series`s.
if TYPE_CHECKING_INVALID_USAGE:
assert_type(i + left, "npt.NDArray[np.int64]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any assert_type() within the if TYPE_CHECKING_INVALID_USAGE has to have a # type: ignore in it.

If it fails at runtime, and we can't detect it (which is the case here, I think), then just comment out the test and include a comment indicating why this can't be tested.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 2, 2025

Hi @Dr-Irv , before trying to resolve the comments, I would like to ask your opinion on the progress.

In this MR we need Series[Timestamp] - Series[Timestamp] -> Series[Timedelta]. It seems to me that for mypy, this is incompatible with Series[Any] - Series[Timestamp] -> Never (or error) and Series[Timestamp] - Series[Any] -> Never (or error). It could have to do with python/mypy#19525.

As for now, all tests for Series[Any] - Series[Timestamp] -> Never or Series[Timestamp] - Series[Any] -> Never are disabled.

What do you think? Shall we continue with the current effort, at the price of maybe dropping the -> Never's?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 2, 2025

As for now, all tests for Series[Any] - Series[Timestamp] -> Never or Series[Timestamp] - Series[Any] -> Never are disabled.

What do you think? Shall we continue with the current effort, at the price of maybe dropping the -> Never's?

A couple of questions:

  1. If you enable those tests that you disabled, do they pass with pyright, but only fail with mypy ?
  2. If you were to move the overloads for Series.__sub__(self: Series[Timestamp], ...) to before the overloads forSeries[Never], does that fix the issue for mypy`? Or does it cause other tests to fail?

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 2, 2025

1. If you enable those tests that you disabled, do they pass with `pyright`, but only fail with `mypy` ?

Yes, the errors are mypy-specific.

2. If you were to move the overloads for `Series.__sub__(self: Series[Timestamp], ...)` to before the overloads for`Series[Never]`, does that fix the issue for `mypy`?  Or does it cause other tests to fail?

Let's say we have the following two as the first two overloads of __sub__:

    @overload
    def __sub__(
        self: Series[Timestamp], other: Series[Any]
    ) -> Never: ...
    @overload
    def __sub__(
        self: Series[Timestamp], other: datetime | np.datetime64 | np_ndarray_dt
    ) -> TimedeltaSeries: ...
  • Series[Any] - Series[Timestamp] now gives Never by pyright and mypy
  • Unfortunately, Series[Any] - Series[Any] now also gives Never by both pyright and mypy, so does Series[Any] - Series[bool], etc.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 3, 2025

1. If you enable those tests that you disabled, do they pass with `pyright`, but only fail with `mypy` ?

Yes, the errors are mypy-specific.

OK, so I say leave in the tests, put in a # type: ignore for mypy on those tests, and include a comment that says that mypy is not processing this correctly as evidenced by pyright .

Do the same issues occur with Series.sub() ?

  • Series[Any] - Series[Timestamp] now gives Never by pyright and mypy
  • Unfortunately, Series[Any] - Series[Any] now also gives Never by both pyright and mypy, so does Series[Any] - Series[bool], etc.

OK, so much for that idea. I think I understand why.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 4, 2025

OK, so I say leave in the tests, put in a # type: ignore for mypy on those tests, and include a comment that says that mypy is not processing this correctly as evidenced by pyright .

Hi @Dr-Irv, after some thoughts, my feeling now is that such behaviour is a pyright bug, not a mypy issue.

Consider the following example:

from typing import Any, overload, Generic, reveal_type
from typing_extensions import TypeVar, Never, Self

T = TypeVar("T", bound=int, default=Any)

class Se(Generic[T]):
    @overload
    def __sub__(self: Se[int], other: Se[int]) -> Never: ...
    @overload
    def __sub__(self, other: Self) -> Self: ...

def foo(a: Se[int]) -> Se[int]: ...

reveal_type(foo(Se[Any]()))  # mypy: Se[int]; pyright: Se[int], no typing error

reveal_type(Se[Any]() - Se[Any]())  # mypy: Any, pyright: Never

The first reveal_type basically says that whenever Se[specific] is accepted, Se[Any] should also be accepted. This is in accordance with people's arguments in python/mypy#19525.

If we agree with what I wrote above, the two overloads of __sub__ are indeed ambiguous For Se[Any], and mypy correctly shows Any because of the ambiguity, whereas pyright somehow did not figure out the problem. By the same reasoning, Series[Any] - Series[Timestamp] -> Never and Series[Any] - Series[Any] -> Series[Any] are incompatible.

What do you think?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 4, 2025

What do you think?

I think that the typing spec is unclear with respect to how overloads get matched with generic types and defaults.

If you have a variable that has typeSe[Any], then it could match Se[int] .

Conversely, if you have a variable that has type Se[int], then it could match Se[Any].

I don't think that mypy should be returning Any, because there is no overload that returns Any.

Consider this modification of your example, where I took out the default value of the TypeVar:

from __future__ import annotations
from typing import (
    Any,
    overload,
    Generic,
    Never,
    reveal_type,
    Self,
    Any,
)
from typing_extensions import TypeVar


T = TypeVar("T", int, str)


class Se(Generic[T]):
    @overload
    def __sub__(self: Se[int], other: Se[int]) -> Never: ...
    @overload
    def __sub__(self, other: Self) -> Self: ...
    def __sub__(self, other: Any) -> Self:
        return self


def foo(a: Se[int]) -> Se[int]:
    return Se[int]()


def unknown(a: list[Se]) -> Se:
    reveal_type(a[0])
    return a[0]


def t1() -> None:
    seany = Se[Any]()
    reveal_type(seany)
    fooseany = foo(seany)
    reveal_type(fooseany)

    sub = seany - Se[Any]()
    reveal_type(sub)


def t2() -> None:
    seunk = unknown([Se[Any]()])
    reveal_type(seunk)
    foosunk = foo(seunk)
    reveal_type(seunk)
    sub = seunk - seunk
    reveal_type(sub)

With pyright, the first reveal_type is Series[Unknown], but with mypy it is Series[Any]. pyright is making a distinction between Series[Unknown] and Series[Any]. I don't think mypy is doing that.

Here's something to try. What if we created a class PUnknown: (for "pandas unknown")

class PUnknown:
    pass

Then the TypeVar for S1 would have default=PUnknown and S1 could include PUnknown. Then whenever we have methods that return a Series that has unknown type, it will be Series[PUnknown]. I think (but I'm really not sure) that might be treated differently than Series[Any] by mypy.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 4, 2025

Let me take your example, but restrict to the case I think we are really concerned with.

from __future__ import annotations
from typing import (
    Any,
    overload,
    Generic,
    Never,
    reveal_type,
    Self,
)
from typing_extensions import TypeVar

T = TypeVar("T", int, str)

class Se(Generic[T]):
    @overload
    def __sub__(self: Se[int], other: Se[int]) -> Never: ...
    @overload
    def __sub__(self, other: Self) -> Self: ...
    def __sub__(self, other):
        return self

def foo(a: Se[int]) -> Se[int]:
    return Se[int]()

def t1() -> None:
    seany = Se[Any]()

    sub = seany - Se[Any]()
    reveal_type(sub)  # mypy: Any, pyright: Never.
    # Mypy thinks both Se[int] - Se[int] -> Never and Self - Self -> Self apply. The result is ambiguous, so mypy gives Any.
    # Pyright does not take into account the second overload.

I believe this is essentially the same as the example 4 in the python typing documentation for overload, extracted below:

@overload
def example4(x: list[int], y: int) -> int: ...
@overload
def example4(x: list[str], y: str) -> int: ...
@overload
def example4(x: int, y: int) -> list[int]: ...

def test(v1: list[Any], v2: Any):
    # Step 2 eliminates the second overload. Step 5
    # determines that first and third overloads
    # both apply and are ambiguous due to Any, and
    # the return types are inconsistent.
    r2 = example4(v2, 1)
    reveal_type(r2)  # Should reveal Any

Above, mypy indeed gives Any. However pyright gives Unknown. If following this documentation is the right way to go, pyright is deviating from it. Moreover, the first and the third overloads are indeed incompatible. Removing any one of them will leads to a resulting typing of the other one.

What I want to argue is that Series[Any] - Series[Timestamp] -> Never and Series[Any] - Series[Any] -> Series[Any] are incompatible, if we follow the documentation of the python typing team.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 4, 2025

What I want to argue is that Series[Any] - Series[Timestamp] -> Never and Series[Any] - Series[Any] -> Series[Any] are incompatible, if we follow the documentation of the python typing team.

Yes, but I think we can get around this by never having a type corresponding to Series[Any] using the PUnknown idea I listed above.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 4, 2025

Yes, but I think we can get around this by never having a type corresponding to Series[Any] using the PUnknown idea I listed above.

If I understand correctly, we can do the experiment below: (it's an pyi file, instead of py file, so no implementation is needed)

from typing import Any, overload, Generic, reveal_type
from typing_extensions import TypeVar, Never, Self

class PUnknown: ...

T = TypeVar("T", bound=int | PUnknown, default=PUnknown)

class Se(Generic[T]):
    @overload
    def __sub__(self: Se[int], other: Se[int]) -> Never: ...
    @overload
    def __sub__(self, other: Self) -> Self: ...

def foo(a: Se[int]) -> Se[int]: ...

reveal_type(foo(Se[PUnknown]()))  # mypy, pyright: cannot assign

reveal_type(Se[PUnknown]() - Se[PUnknown]())  # mypy, pyright: Se[PUnknown]

I think it will be a big change, worthy a separate PR, if we do it. In particular, foo(Se[PUnknown]()) gives an assignment error now, because int is not a subtype of PUnknown. In the current approach, in contrast, int is a subtype of Any.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 4, 2025

I think it will be a big change, worthy a separate PR, if we do it. In particular, foo(Se[PUnknown]()) gives an assignment error now, because int is not a subtype of PUnknown. In the current approach, in contrast, int is a subtype of Any.

I played with the idea. Seems like a lot of work. The subtype argument is what makes it a problem. So let's not go down that path.

Above, mypy indeed gives Any. However pyright gives Unknown. If following this documentation is the right way to go, pyright is deviating from it. Moreover, the first and the third overloads are indeed incompatible. Removing any one of them will leads to a resulting typing of the other one.

I actually don't think that pyright is wrong by saying Unknown. It's saying it couldn't make a match that would resolve the type. I like that pyright says Unknown vs. Any because Any is different (and can be declared in a type declaration), whereas Unknown says that the code is ambiguous.

I wish the typing spec allowed you to have Unknown as a type that was treated differently with respect to generics.

What I want to argue is that Series[Any] - Series[Timestamp] -> Never and Series[Any] - Series[Any] -> Series[Any] are incompatible, if we follow the documentation of the python typing team.

I see your point. So where does that leave us? What are our options?

This might be the argument to keep TimestampSeries and TimedeltaSeries ?? One way to look at it is that for Index we have DatetimeIndex and TimedeltaIndex, so maybe TimestampSeries and TimedeltaSeries are analogous?

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Sep 6, 2025

So where does that leave us? What are our options?

The following is my understanding:

  1. We can keep TimestampSeries and TimedeltaSeries.
    • Pro: We are able to keep Series[Any] - TimestampSeries -> Never, etc, while keeping Series[Any] - Series[Any] -> Never
    • Con: TimestampSeries and TimedeltaSeries are quite unintuitive. Even as a contributor to pandas-stub, who knows about TimestampSeries, I am still reluctant to look up how to import TimestampSeries, when I just need to cast. It would be much easier to be able to use pd.Series[pd.Timestamp] for users.
  2. We can drop TimestampSeries in this PR, and keep Series[Any] - Series[Timestamp] -> Never just for pyright.
    • Pro: Our philosophy is kept within pyright
    • Con: mypy does not agree. Furthermore, as we have discussed below, being able to keep Series[Any] - Series[Timestamp] -> Never violates the rules described by the python typing team, which pyright possibly needs to fix in the future.
  3. We can drop TimestampSeries in this PR, as well as make Series[Any] - Series[Timestamp] -> Series[Any].
    • Pros:
      • Results are consistent for pyright and mypy
      • Results are consistent with simpler types. The python native types, Any - datetime.datetime, gives Any instead of Never. If pandas-stubs follows this example, Series[Any] - Series[Timestamp] -> Series[Any] seems legitimate to me.
        from typing import Any, reveal_type
        from datetime import datetime
        
        import pandas as pd
        
        a: Any
        
        reveal_type(a - datetime(2025, 1, 1))  # Any
        reveal_type(a - pd.Timestamp(2025, 1, 1))  # Any
    • Con: we need to change our philosophy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants